Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

نویسندگان

Soheil Khorram

Hossein Sameti

Fahimeh Bahmaninezhad

Simon King

Thomas Drugman

چکیده

semi Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations of this decision tree-based structure: i) the decision tree structure lacks adequate context generalization; ii) it is unable to express complex context dependencies; iii) parameters generated from this structure represent sudden transitions between adjacent states. In order to alleviate above limitations , many former papers applied multiple decision trees with an additive assumption over those trees. Similarly current study uses multiple decision trees as well, but instead of the additive assumption it is proposed to train the smoothest distribution by maximizing entropy measure. Obviously, increasing the smoothness of the distribution improves the context generalization. The proposed model, named hidden maximum entropy model (HMEM), estimates a distribution that maximizes entropy subject to multiple moment-based constraints. Due to the simultaneous use of multiple decision trees and maximum entropy measure, the three aforementioned issues are considerably alleviated. Relying on HMEM, a novel speech synthesis system has been developed with maximum likelihood (ML) parameter re-estimation as well as maximum output probability parameter generation. Additionally , an effective and fast algorithm that builds multiple decision trees in parallel is devised. Two sets of experiments have been conducted to evaluate the performance of the proposed system. In the first set of experiments, HMEM with some heuristic context clusters is implemented. This system outperformed the decision tree structure in small training databases (i.e. 50, 100 and 200 sentences). In the second set of experiments, the HMEM performance with four parallel decision trees are investigated using both subjective and objective tests. All evaluation results of the second experiment confirm significant improvement of the proposed system over the conventional HSMM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn

Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder to render speech given a text. Typically decision tree-clustered context-dependent hidden Markov models (HMMs) are employed as the acoustic model, which represent a relationship between linguistic and acoustic features. Recently, artificial neural network-based acoustic models, such as deep neural networks, ...

متن کامل

Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters

This paper describes a novel framework for statistical parametric speech synthesis in which statistical modeling of the speech waveform is performed through the joint estimation of acoustic and excitation model parameters. The proposed method combines extraction of spectral parameters, considered as hidden variables, and excitation signal modeling in a fashion similar to factor analyzed traject...

متن کامل

Extracting MFCC, F0 feature in Vietnamese HMM-based speech synthesis

HMM-based statistical speech synthesis method is not requiring a very large speech corpus for training the system. In this system, statistical modeling is applied to learn distributions of context-dependent acoustic vectors extracted from speech signals, each vector containing a suitable parametric representation of one speech frame and Vietnamese phonetic rules to synthesize speech. The method...

متن کامل

Discrete Duration Model for Speech Synthesis

The acoustic model and the duration model are the two major components in statistical parametric speech synthesis (SPSS) systems. The neural network based acoustic model makes it possible to model phoneme duration at phone-level instead of state-level in conventional hidden Markov model (HMM) based SPSS systems. Since the duration of phonemes is countable value, the distribution of the phone-le...

متن کامل

Soft context clustering for F0 modeling in HMM-based speech synthesis

This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional ‘hard’ decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in sy...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

EURASIP J. Audio, Speech and Music Processing

دوره 2014 شماره

صفحات -

تاریخ انتشار 2014

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

نویسندگان

چکیده

منابع مشابه

Acoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn

Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters

Extracting MFCC, F0 feature in Vietnamese HMM-based speech synthesis

Discrete Duration Model for Speech Synthesis

Soft context clustering for F0 modeling in HMM-based speech synthesis

عنوان ژورنال:

اشتراک گذاری